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Abstract 



Following lHartiganI 11197 511 . a cluster is defined as a connected component of 
the ?-level set of the underlying density, i.e., the set of points for which the 
density is greater than t. A clustering algorithm which combines a density 
estimate with spectral clustering techniques is proposed. Our algorithm is 
composed of two steps. First, a nonparametric density estimate is used to 
extract the data points for which the estimated density takes a value greater 
than t. Next, the extracted points are clustered based on the eigenvectors 
of a graph Laplacian matrix. Under mild assumptions, we prove the almost 
sure convergence in operator norm of the empirical graph Laplacian opera- 
tor associated with the algorithm. Furthermore, we give the typical behavior 
of the representation of the dataset into the feature space, which establishes 
the strong consistency of our proposed algorithm. 
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1 Introduction 



The aim of data clustering, or unsupervised classification, is to partition a data set 
into several homogeneous groups relatively separated one from each other with 
respect to a certain distance or notion of similarity. There exists an extensive 



literature on clu s tering methods, and \ ye ref er the reader to Anderberg 
HartiganI [Il975l] . iMcLachlan and Peell hOOdA . Chapter 10 in buda et al 



1973 ]. 



mom. 



and Chapter 14 in iHastie et al.l 11200111 for general materials on the subject. In 
particular, popular clustering algorithms, such as Gaussian mixture models or k- 
means, have proved useful in a number of applications, yet they suffer from some 
internal and computational limitations. Indeed, the parametric assumption at the 
core of mixture models may be too stringent, while the standard k-means algo- 
rithm fails at identifying complex shaped, possibly non-convex, clusters. 



The class of spectral clustering algorithms is presently emerging as a promis- 
ing alternative, showing improved performance over classical clustering algo - 
rithms on several be nchmark problems and applications; see e.g.. lNg et al.l 11200211 . 
von Luxburd ll2007ll. A n overview of spectral clustering algorithms may be found 



in 



von Luxburd ll2007ll . and connections with kernel methods are exposed in lFillipone et al 



1200811 . The spectral clustering algorithm amounts at embedding the data into a 
feature space by using the eigenvectors of the similarity matrix in such a way that 
the clusters may be separated using simple rules, e.g. a separation by hyperplanes. 
The core component of the spectral clustering algorithm is therefore the similarity 
ma trix, or certain normalizations of it, generally called graph Laplacian matrices; 



see 



Chung! Ill 99711 ■ Graph Laplacian matrices may be viewed as discrete versions 
of bounded operators between functional spaces. The study of these operators 



has started out recently with the w o rks bvlBelkin et al. 11200411 . iBelkin and Nivogi 



1 2005 1. 
1I2OO6I1 . 



l] ll2004ll 
il. lKoltc 



Coifman and Lafonl 1200611 . iNadler et all i2006l1 . lKoltchinskiil il998l1 . lGine and Koltchinskii 



Hein et al.l ll2007ll . among others ^ and the convergence of th e spectral clus- 



tering algorithm has been established in lvon Luxburg et al.l ll2008n . 



The standard k-me ans clustering lead s to the optima l quantizer of t he underly- 
ing distribution; see iMacOueenI 11 1 96711 . IPoUardl 1 198lll.lLinderl ll2002ll. Ho wever, 
determining what the limit clustering obtained in lvon Luxburg et al.l II2OO8I1 repre- 



sents for the distribution of the data remains largely an o pen question. As a matter 



of fact, there exists many definitions of a cluster; see e.g.. lvon Luxburg and Ben-David 



)pe 
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feOOSn or iGarcia-Escudero et alj [|2008h . Per haps the most in tuitive and precise 
definition of a cluster is the one introduced by iHartiganl lll975n . Suppose that the 
data is drawn from a probability density / on and let t be a p ositive number 
in the range of /. Then a cluster in the sense of IHartiganl [|l975ll is a connected 
component of the ?-level set 

^(0 = {xgM^ : f{x)>t}. 

This definition has several advantages. First, it is geometrically simple. Second, 
it offers the possibility of filtering out possibly meaningless clusters by keeping 
only the observations falling in a region of high density. This proves useful, for 
instance, in the situation where the data exhibits a cluster structure but is contami- 
nated by a uniform background noise, as illustrated in our simulations in Section 4. 

In this context, the level t should be considered as a resolution level for the data 
analysis. Several c lustering algorithms have b een introduced building upon Harti- 
gan's definition. In lCuevas et al. 1200 in . clustering is p erformed by estimat- 

ing the connected components of ^{t) \ see a lso the work bvlAzzalini and Torelli 
[I2OO7I1 . Hartigan's definition is also used in iBiau et al.l [|2007ll to define an esti- 
mate of the number of clusters. 



In the present paper, the definition of a cluster given by IHartiganl lll975ll is adopted, 
and we introduce a spectral clustering algorithm on estimated level sets. More 
precisely, given a random sample X\,.. . drawn from a density / on W^, our 
proposed algorithm is composed of two operations. In the first step, given a pos- 
itive number t, we extract the observations for which fn{Xi) > t, where /„ is a 
nonparametric density estimate of / based on the sample Xi , . . . ,X„. In the second 
step, we perform a spectral clustering of the extracted points. The remaining data 
points are then left unlabeled. 



Our proposal is to study the asymptotic behavior of this algorithm. As mentioned 
above, strong interest has recently been shown in spectral clustering algorithms, 
and the major cont ribution to the proof of th e con vergence of spectral clust ering 
is certainly due to Ivon Luxburg et al.l [|2008ll . In Ivon Luxburg et al.l [|2008ll . the 
graph Laplacian matrix is associated with some random operator acting on the 
Banach space of continuous functions. They prove the collectively compact con- 
vergence of those operators towards a limit operator. Under mild assumptions, we 
strengthen their results by establishing the almost sure convergence in operator 
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norm, but in a smaller Banach space (Theorem 13.11) . This operator norm con- 
vergence i^jnore_amenabkthan^ slightly weaker notion of convergence estab- 



lished in lvon Luxburg et al.l [|2008i1 . For instance, it is easy to check that the limit 



operator, and the graph Laplacian matrices used in the algorithm, are continuous 
in the scale parameter h. 

We also derive the asymptotic representation of the dataset in the feature space 
in Corollary 13.21 This result implies that the proposed algorithm is strongly con- 
sistent and that, asymptotically, observations of Jf{t) are assigned to the same 
cluster if and only if they fall in the same connected component of the level set 



The paper is organized as follows. In Section [21 we introduce some notations 
and assumptions, as well as our proposed algorithm. Section [3] contains our main 
results, namely the convergence in operator norm of the random operators, and 
the characterization of the dataset embedded into the feature space. We provide 
a numerical example with a simulated dataset in Section IH Sections |5] and [6] are 
devoted to the proofs. At the end of the paper, a technical result on the geometry 
of level sets is stated in Appendix A, some useful results of functional analysis are 
summarized in Appendix B, and the theoretical properties of the limit operator are 
given in Appendix C. 



2 Spectral clustering algorithm 
2.1 Mathematical setting and assumptions 

Let {Xi}j>i be a sequence of i.i.d. random vectors in W^, with common probabil- 
ity measure /i. Suppose that ji admits a density / with respect to the Lebesgue 
measure on W^. The t-level set of / is denoted by =5f (?), i.e., 

^{t) = {xeW^ : f{x)>t], 

for all positive level f, and given a < b, Sf^ denotes the set {x : a < f{x) < 
b}. The differentiation operator with respect to x is denoted by Dx. We assume 
that / satisfies the following conditions. 

Assumption 1. fzj / is of class on W^; (ii) \\Djf\\ > on the set 
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{x e M'^ : f{x) =t}; (Hi) f, Dj^f, and D^f are uniformly bounded on 

Note that under Assumption l,^{t) is compact whenever t belongs to the interior 
of the range of /. Moreover, =5? (?) has a finite number i of connected components 
"^y, j = 1 , . . . , £. For ease of notation, the dependence of '^j on t is omitted. The 
minimal distance between the connected components of ^{t) is denoted by <i„,/„, 
i.e., 

J„„„ = infdist(^-,<^;). (2.1) 

¥j 

Let /„ be a consistent density estimate of / based on the random sample Xi , . . . , X„. 
The ?-level set of /„ is denoted by ^n{t), i.e., 

^„{t) = {xeR'^ ■.Mx)>t}. 

Let J{n) be the set of integers defined by 

j{n) = {je{i,...M-fn{Xi)>t}. 

The cardinality of J{n) is denoted by j{n). 

Let k -.W^ ^ M-i- be a fixed function. The unit ball of centered at the origin is 
denoted by B, and the ball centered at x G and of radius r is denoted hy x + rB. 
We assume throughout that the function k satisfies the following set of conditions. 

Assumption 2. (i) k is of class on (ii) the support of k is 
B; (Hi) k is uniformly bounded from below on 5/2 by some positive 
number; and (iv) k{—x) = k{x) for all x G W^. 

Let h he a positive number. We denote by k^ : — )■ M-|- the map defined by 
kii{u) = k{u/h). 

2.2 Algorithm 

The first ingredient of our algorithm is the similarity matrix K^^f, whose elements 
are given by 

K„j,{iJ)=kh{Xj-X,), 
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and where the integers / and j range over the random set J{n). Hence K„ /, is 
a random matrix indexed by J{n) x J{n), whose values depend on the function 
kh, and on the observations Xj lying in the estimated level set Next, we 

introduce the diagonal normalization matrix D„ k whose diagonal entries are given 
by 

jeJ{n) 

Note that the diagonal elements of D,, are positive. 

The spectral clustering algorithm is based on the matrix Q,j defined by 

Observe that Q„ /, is a random Markovian transition matrix. Note also that the 
(random) eigenvalues of Q„ /, are real numbers and that Q„ j, is diagonalizable. In- 

— 1/2 —1/2 

deed the matrix Q„ is conjugate to the symmetric matrix S„ /, := ^' K„ /,D^ j^' 
since we may write 

Moreover, the inequality ||Q„,/!||oo < 1 implies that the spectrum C5'(Q„ is a sub- 
set of [— 1;-|-1]. Let 1 = A„ 1 > A„^2 > ■ • • > ^nj{n) > — 1 be the eigenvalues of 
Qn.iu where in this enumeration, an eigenvalue is repeated as many times as its 
multiplicity. 

To implement the spectral clustering algorithm, the data points of the partitioning 
problem are first embedded into by using the eigenvectors of Q,,,/, associated 
with the i largest eigenvalues, namely A„j, A„^2. • • -K.e- More precisely, fix a 
collection Vn,i, Vn,2, ■ ■ Vn/ of such eigenvectors with components respectively 
given by Vn.k = {Vn,k.j }jeJ{n) , for = 1 , . . . , £. Then the j* data point, for j in J{n) , 
is represented by the vector p„{Xj) of defined by p„{Xj) := {Vn.k,j}i<k<e- At 
last, the embedded points are partitioned using a classical clustering method, such 
as the k-means algorithm for instance. 

2.3 Functional operators associated with the matrices of the 
algorithm 

As exposed in the Introduction, some functional operators are associated with the 
matrices acting on C-'^"-' defined in the previous paragraph. The link between 
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matrices and functional operators is provided by the evaluation map defined in 
( 12.31 ) below. As a consequence, asymptotic results on the clustering algorithm 
may be derived by studying first the limit behavior of these operators. 

To this aim, let us first introduce some additional notation. For ^ a subset of M"^, 
let 'iy(^) be the Banach space of complex- valued, bounded, and continuously 
differentiable functions with bounded gradient, endowed with the norm 

\g\w = \\g\U + \\D^g\\^. 



Consider the non-oriented graph whose vertices are the Xfs for j ranging in J{n). 
The similarity matrix K„ /, gives random weights to the edges of the graph and 
the random transition matrix Q„ /, defines a random walk on the vertices of a 
random graph. Associated with this random walk is the transition operator 2„ /, : 
W{^n{t)) -^W{Xi{t)) defined for any function g by 



In this equation, is the discrete random probability measure given by 



and 



qn,h(x.y)= !;^,V - wherei^„,,W= / Uy - xX^dy) . (2.2) 

In the definition of qn.h, we use the convention that 0/0 = 0, but this situation does 
not occur in the proofs of our results. 



Given the evaluation map Tin '■ W {J^n{t)) — )■ C-'^^") defined by 

Kn{g) = {g{Xj) : jG7H}, (2.3) 

the matrix Q„ /, and the operator Q„_h are related by Q,^ /, o %„ = jtn o 2„ Us- 
ing this relation, asymptotic properties of the spectral clustering algorithm may 
be deduced from the limit behavior of the sequence of operators {Qn,h}n- The 
difficulty, though, is that 2„ /, acts on (^,(?)) and ^„{t) is a random set which 
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varies with the sample. For this reason, we introduce a sequence of operators 2,, /j 
acting on W[^{t)) and constructed from Qnj^ as follows. 



First of all, recall that under Assumption 1, the gradient of / does not vanish on 
the set {x G M'^ : f{x) = t}. Since / is of class a continuity argument implies 
that there exists £o > such that ^1^^ contains no critical points of /. Under this 
condition. Lemma [ATI states that ^{t + £) is diffeomorphic to for every e 
such that |£| < £o. In all of the following, it is assumed that Cq is small enough so 
that 

eo/a(£o) < h/2, where a(eo) = mf{\\DJ{x)\\-x e ^f-eo}- (2.4) 

Let {£,7},, be a sequence of positive numbers such that £„ < £0 for each n, and 
£„ — 7- as n — 7- 00. In Lemma IaTI an explicit diffeomorphism ^„ carrying =Sf (?) to 
^{t — £„) is constructed, i.e., 

(p„:^(0 A^(?-£,0. (2.5) 

The diffeomorphism % induces the linear operator <f>„ : W(^^{t)) — W(^^{t — 
£„)) defined by ^ng = g° (pn^- 



Second, let i2„ be the probability event defined by 



||/.-/||oc<£„ 



n 



inf \\DMx)l^^^!- 



>l\\D.fl 



(2.6) 



Note that on the event Q„, the following inclusions hold: 

^{t-En) C Xr{t) C ^{t + £„). (2.7) 

We assume that the indicator function 1^2,, tends to 1 almost surely as n — )■ 00, 
which is satisfied by common density estimates /„ under mild assumptions. For 
instance, consider a kernel density estimate with a Gaussian kernel. Then for a 
density / satisfying the conditions in Assumption 1 , we have \\ Di^^ fn —Dx ^^ f ||oc — )■ 



almost surely as « — 0°, for p = and p = I (see e.g., iPrakasa Raol [|1983|] ). 
which implies that 1q^^ 1 almost surely as « ^> 0°. 



We are now in a position to introduce the operator Q„^h '■ W[^{t)) W^(=Sf(?)) 
defined on the event Q„ by 

QnJi = ^n^Qn,h<l>n, (2.8) 
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and we extend the definition of 2„ /, to the whole probability space by setting it 
to the null operator on the complement of i2„. In other words, on Q^, the 
function Qnjig is identically zero for eachgG 

Remark 2.1. Albeit the relevant part of 2,, /, is defined on Qn for technical rea- 
sons, this does not bring any difficulty as long as one is concerned with almost 
sure convergence. To see this, let {Q,£/,P) be the probability space on which 
the X,'s are defined. Denote by ^^.x, the event on which tends to 1, and recall 
that P{Qoo) = 1 by assumption. Thus, for every co E Q, there exists a random 
integer no{(o) such that, for each n > no{(o), O) lies in i2„. Besides no (ft)) is finite 
on Q^. Hence in particular, if {Z„} is a sequence of random variables such that 
Z„li2„ converges almost surely to some random variable Zoo, then Z„ — t- Zoo almost 
surely. 



3 Main results 

Our main result (Theorem 13.11) states that 2„ /, converges in operator norm to the 
limit operator Q,, : W{^{t)) W{^{t)) defined by 

Qhg{x)= / q,,{x,y)g{y)^'{dy), (3.1) 

where jj.' denotes the conditional distribution of X given the event [X E ^(?)], 
and where 

q,{x,y) = ^'^J^^\ withKh{x)= [ h{y - x) ll' {dy) . (3.2) 
Kh{x) Jjf{t) 

Theorem 3.1 (Operator Norm Convergence). Suppose that Assumptions 1 and 2 
hold. We have 

\\Qn,h — Qh\\yy almost surely as n ^ oa. 

The proof of Theorem 13.11 is given in Paragraph 15.21 Its main arguments are as 
follows. First, the three classes of functions defined in Lemma 15.21 are shown 
to be Glivenko-Cantelli. This, together with additional technical results, leads to 
uniform convergences of some linear operators (Lemma [5!6l) . 
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Theorem 13.11 implies the consistency of our algorithm. We recall that dmm given 
in (12.11 ) is the minimal distance between the connected components of the level 
set. The starting point is the fact that, provided that h < Jmin^ the connected 
components of the level set =5f (?) are the recurrent classes of the Markov chain 
whose transitions are given by Q/^. Indeed, this process cannot jump from one 
component to the other ones. Hence, defines the desired clustering via its 
eigenspace corresponding to the eigenvalue 1 . 

As stated in Proposition IC.2I in the Appendices, the eigenspace of the limit op- 
erator Qh associated with the eigenvalue 1 is spanned by the indicator functions 
of the connected components of =Sf (?). Hence the representation of the extracted 
part of the dataset into the feature space (see the end of Paragraph 12.21) tends 
to concentrate around i different centroids. Moreover, each of these centroids 
corresponds to a cluster, i.e., to a connected component of ^{t). 

More precisely, using the convergence in operator norm of 2„ /, towards Qh, to- 
gether with the results of functional analysis given in Appendix |Bl we obtain the 
following corollary which describes the asymptotic behavior of our algorithm. Let 
us denote by J{°°) the set of integers j such that Xj is in the level set ^{t). For 
all j E J{°°), define k{j) as the integer such thatX^ G "^^q)- 

Corollary 3.2. Suppose that Assumptions 1 and 2 hold, and that h is in {0;dmi„). 
There exists a sequence {t,n}n of linear transformations o/M^ such that, for all 
j E J{°°), ^npniXj) converges almost surely to ekijy where e^Q) is the vector ofMf 
whose components are all except the k(jy^ component equal to 1. 

Corollary 13. 2[ which is new up to our knowledge, is proved in Section [6l Corol- 
lary 13.21 states that the data points embedded in the feature space concentrate on 
separated centroids. As a consequence, any partitioning algorithm (e.g., fc-means) 
applied in the feature space will asymptotically yield the desired clustering. In 
other words, the clustering algorithm is consistent. Note that if one is only inter- 
ested in the consistency property, then this r e sult c ould be obtained through an- 



other route. Indeed, it is shown in lBiau et al.l [1200711 that the neighborhood graph 



with connectivity radius h has asymptotically the same number of connected com- 
ponents as the level set. Hence, splitting the graph into its connected components 
leads to the desired clustering as well. But Corollary 13.21 by giving the asymp- 
totic representation of the data when embedded in the feature space M^, provides 
additional insight into spectral clusterin g algorithms. In particular . Corol lary 13.21 
provides a rationale for the heuristic of IZelnik-Manor and Perona ll2004ll for au- 



tomatic selection of the number of groups. Their idea is to quantify the amount 
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of concentration of the points embedded in the feature space, and to select the 
number of groups leading to the maximal concentr ation. Their method compared 
favorably with the eigengap heuristic considered in lvon LuxburgI 11200711 . 



Naturally, the selection of the number of groups is also linked with the choice of 
the parameter h. In this direction, let us emphasize that the operators 2,, /, and 
Qh depend continuously on the scale parameter h. Thus, the spectral properties 
of both operators will be close to the ones stated in Corollary 13.21 if h is in the 
neighborhood of the interval {0;d,nin)- This follows from the continuity of an iso- 
lated set of eigenvalues, as stated in Appendix |Bl In particular, the sum of the 
eigenspaces of Qh associated with the eigenvalues close to 1 is spanned by func- 
tions that are close to (in V7(.if (?))-norm) the indicator functions of the connected 
components of Jf{t). Hence, the representation of the dataset in the feature space 
still concentrates on some neighborhoods of ej^, I < k < £ and a simple clus- 
tering algorithm such as ^-means will still give the desired result. To sum up the 
above, if assumptions 1 and 2 hold, our algorithm is consistent for all h in (0, h,nax) 

for some hmax > dmin- 

Several questions, though, remain largely open. For instance, one might ask if a 
similar result holds for the classical spectral clustering algorithm, i.e., without the 
preprocessing step. This case corresponds to taking t = 0. One possibility may 
then be to consider a sequence /i,,, with lim/z„ = and to the study the limit of the 
operator 

4 Simulations 

We consider a mixture density on with four components corresponding to ran- 
dom variables Xi , . . . , X4 where 

(i) Xi ~ ^(0, afl) with ai = 0.2 ; 

(ii) X2 =7?2(cos02,sin02) where 02 ~ '^([0;2;r]) andi?2 ~ ^(1,0.1^) ; 

(iii) X3 =7?3(cos03,sine3) where 63 ~ '^([0;2;r]) andi?3 ~ ^(2,0.22) ; 

(iv) X4~'^([-3;3]x[-3;3]). 
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Figure 1: Left: simulated points. Right: Points belonging to the estimated level set (red 
triangle) and remaining points (dark cross). 

The proportions of the components in the mixture are taken as 10%, 32%, 53% 
and 5%, respectively. The fourth component (X4) represents a uniform back- 
ground noise. 

A random sample of size n = 1 , 900 has been simulated according to the mixture. 
Points are displayed in Figure [H (left). A nonparametric kernel density estimate, 
with a Gaussian kernel, has been adjusted to the data. The bandwidth parameter 
of the density estimate has been selected automatically with cross-validation. A 
level t — 0.0444 has been selected such that 85% of the simulated points are ex- 
tracted, i.e., 85% of the observations fall in ^„{t). The extracted and discarded 
points are displayed in Figure [U (right). The number of extracted points is equal 
to 1,615. 

The spectral clustering has been applied to the 1,615 extracted points, with the 
similarity function 

k{x) = exp(-l/(l - ||jc||)2)l{||x|| < 1}. 

For numerical stability of the algorithm, we considered the eigendecomposition 
of the symmetric matrix I — S,,,/,. Thus, the eigenspace associated with the eigen- 
value 1 of the matrix Q„ corresponds to the null space of I — S„ The scale 
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Figure 2: Top Left: first 10 eigenvalues, sorted in ascending order. Top Right: pairs plots 
of the first three eigenvectors. It may be seen that the embedded data concen- 
trate around three distinct points in the feature space M? . Bottom Resulting 
partition obtained by applying a /:-means algorithm in the feature space. The 
color scheme is identical to the representation of the eigenvectors (top-right 
panel). The three groups are accurately recovered. 
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Figure 3: First 50 eigenvalues of the standard spectral clustering algorithm, applied on the 
initial data set, i.e., without level set pre-processing. A total of 35 eigenvalues 
are found equal to zero, which leads to 35 inhomogeneous groups, indicating 
failure of the standard spectral clustering algorithm. 

parameter h has be empirically chosen equal to 0.25. The first 10 eigenvalues of 
I — S„ /, are represented in Figure |2] (top-left). Three eigenvalues are found equal 
to zero, indicating three distinct groups. The data is then embedded in using 
the three eigenvectors of the null space of I — S„ and the data is partitioned in 
this space using a fc-means clustering algorithm. Pair plots of three eigenvectors 
of the null space are displayed in Figured It may be observed that the embedded 
data are concentrated around three distinct points in the feature space. Applying 
a fc-means algorithm in the feature space leads to the partition represented in Fig- 
ure [2l Note that observations considered as background noise are the discarded 
points belonging to the complement of ^n{t)- In this example, our algorithm is 
successful at recovering the three expected groups. 

As a comparison, we applied the standard spectral clustering algorithm to the ini- 
tial data set of size n = 1,900. In this case, 35 eigenvalues are found equal to 
zero (Figure |3]). Applying a means clustering algorithm in the embedding space 
M?^ leads to 35 inhomogeneous groups (not displayed here), none of which cor- 
responds roughly to the expected groups (the two circular bands and the inner cir- 
cle). This failure of the standard spectral clustering algorithm is explained by the 
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presence of the background noise which, when unfiltered, perturbs the formation 
of distinct groups. While there remains multiple important questions, in particu- 
lar regarding the choice of the parameter h, these simulations illustrate the added 
value of combining a spectral clustering algorithm with level-set techniques. 



5 Proof of the convergence of Q^^h (Theorem 133]) 

5.1 Preliminaries 

Let us start with the following simple lemma. 

Lemma 5.1. Let {A„}„>o be a decreasing sequence ofBorel sets in W^, with limit 
= n„>oA„. Ifii{Aj) = 0, then 

1 " 

^iiA,! = - V l{Xi G A„} — 7- almost surely as n ^ oo, 
n 

1=1 

where P„ is the empirical measure associated with the random sample Xi,. .. ,X„. 

Proof. First, note that lim„/i(A„) = /i(Aoo). Next, fix an integer k. For all n>k, 
A„ C Ak and so P„A„ < P„Ajt. But lim„P„A^; = IJ-{Ak) almost surely by the law 
of large numbers. Consequently limsup,,P„A„ < /i(Afc) almost surely. Letting 
oo yields 

limsupP„A„ < /i(Ac») = 0, 

n 

which concludes the proof since P„A„ > 0. □ 

The operator norm convergence that we expect to prove is a uniform law of large 
number. The key argument is the fact that the classes of functions of the following 
lemma are Glivenko-Cantelli. Let ^ be a function defined on some subset ^ of 
MJ^, and let he a subset of ^. In what follows, for all x G W^, the notation 
g{x)l^{x) stands for g{x) ifxE^/ and otherwise. 

Lemma 5.2. 1. The two collections of functions 

^1 := {y^kh{y-x)lj^^f){y) : x e ^{t - Eq)} , 
^2--= {y^D^kh{y-x)lj^(^,^{y) : x e ^{t - Eq)}, 
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are Glivenko-Cantelli, where Dxkh denotes the differential ofk^. 

2. Let r : =Sf (f) xW^ be a continuously differentiable function such that 

( i) there exists a compact J(f C such that r{x, y) = Ofor all {x, y) E ^{t) x K'^; 

(ii) r is uniformly bounded on ^{t) x M^, i.e. ||r||oo < °°. 
Then the collection of functions 

^3 := [y ^ r{x,y)g{y)\<^(^t){y) ■ ^ ^ ^(0. \\s\\w{.^^{t)) < l} 
is Glivenko-Cantelli. 



Proof. 1. Clearly has an integrable envelope since kh is uniformly bounded. 
Moreover, for each fixed y, the map x i— )■ ki^{y — x)l^(^i^{y) is continuous, and 
^{t — Eq) is compact. Hence for each 5 > 0, using a finite covering of ^{t — Cq), 
it is easy to construct finitely many Li brackets of size a t most d whose union 
cover see e.g.. Example 19.8 in Ivan der Vaarti lll998n . So is Glivenko- 
Cantelli. Since k^ is continuously differentiable and with compact support, the 
same arguments apply to each component of Dj^k^, and so ^2 is also a Glivenko- 
Cantelli class. 



2. Set ^ = {y f{x,y) '■ X £ First, since r is continuous on the com- 

pact set =Sf (?) X J^, it is uniformly continuous. So a finite covering of ^ of 
arbitrary size in the supremum norm may be obtained from a finite covering of 
^{t) X Ji!' . Hence has finite entropy in the supremum norm. Second, set 
— \y^ s{y)'^.'£(t){y^ ■ \\8\\w{.if{t))1^ l}. Denote by the convex huU of ^(r), 
and consider the collection of functions ^ = {g : ^ : \\g\\w(^') ^1}- Then 
^ has finit e entropy in the supr emum norm; see iKolmogorov and Tikhomirov 
il96lll and van der Vaart 1 1994 1. Using the surjection ^ — )■ ^ carrying g to 
[gl ^(r)), that 5f has finite entropy in the supremum norm readily follows. To 
conclude the proof, since both ^ and ^ are uniformly bounded, a finite covering 
of ^3 of arbitrary size 6 in the supremum norm may be obtained from finite cov- 
erings of M and 5f , which yields a finite covering of ^3 by Li brackets of size at 
most 25. So ^3 is a Glivenko-Cantelli class. □ 



We recall that the limit operator Qh is given by (13.11 ). The following lemma gives 
useful bounds on and q^, both defined in (13.21) . 

Lemma 5.3. 1. The function is uniformly bounded from below by some positive 
number on ^{t — Cq), i.e., vs\i{Kh{x) : x E ^{t — £0)} > 0; 
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2. The kernel is uniformly bounded, i.e., \\qh\\oo < °°; 

3. The differential ofq^ with respect to x is uniformly hounded on ££{t — Cq) x R"^, 
i.e., sup { \D^qh{x,y) || : {x,y) G ££{t - Cq) x M^} < oo; 

4. The Hessian of q^ with respect to x is uniformly bounded on ^{t — Cq) x R"^', 
i.e., ^\x^{\\Dlqh{x,y)\\ : e - So) x R^} < oo. 

Proof. First observe that the statements 2, 3 and 4 are immediate consequences of 
statement 1 together with the fact that the function fc/, is of class 'lo^ with compact 
support, which implies that {y — x), D^k^ {y — x), and D^k^ {y — x) are uniformly 
bounded. 

To prove statement 1, note that is continuous and that Kh{x) > for all x G 
^{t). Set 

a{£o)=mf{\\DJ{x)\\-xe^l_^}. 
Let {x,y) e X d^{t). Then 

£o>f{y)-f{x) > a{£o)\\y-x\\. 

Thus, — < eo/oc(eo) and so 

dist(x, ^(t)) < for all x G 

^ ^ a(eo) " 

Recall from (IZ4l) that h/2> Eq/ a{£o). Consequently, for a\\xeJf{t- Co) , the set 
{x + hB/2)r] ^{t) contains a non-empty, open set U (x) . Moreover ky, is bounded 
from below by some positive number onhB/2 by Assumption 2. Hence Ki^{x) > 
for all X in ^{t — Cq) and point 1 follows from the continuity of and the com- 
pactness of ^{t — Co) . □ 



In order to prove the convergence of 2„ to Q^, we also need to study the uniform 
convergence of K^ji, given in (12.21) . Lemma \5A\ controls the difference between 
h and Kj^, while Lemma [531 controls the ratio of Ki^ over K^ ^. 

Lemma 5.4. As n ^oo, almost surely. 



1. sup 

2. sup 



Knji{x) -Kf,{x 
DxKn,h{x)-DxKh{x 



— and 



^0. 
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Proof. Let 



Let us start with the inequality 

Kn,h{x)-Kh{x)\ < \Knj,{x)-Klj^{x)\ + \kI^^{x) -Kh{x) 
for all X G ^{t — £o) . Using the inequality 



< 



1 



\\h\ 



(5.1) 



we conclude that the first term in (15.11) tends to uniformly in x over ^{t — £q) 
with probability one as n oo, since j{n)/n — t- /i (^(?)) almost surely, and since 
kfi is bounded on W^. 



Next, for all x G =Sf — Co), we have 



< 



+ 



(5.2) 



The first term in (15.21 ) is bounded by 



< 



11^- 



/1 1 1 °° 



where ^,i(?)z\=Sf (?) denotes the symmetric difference between ^n{t) and ^{t). 
Recall that, on the event i2„, ^{t - £„) C =Sf„(0 C ^{t - £„) . Therefore Jfn{t)A^{t) C 
on Qn, and so 



o<i 



£{i^„(,)(x,)-i^(,)(x,-)} 

i=i 



1=1 
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where A„ = Hence by Lemma [STTl and since 1^2,, — ?■ 1 almost surely as 

n — !■ the first term in (15.21) converges to with probability one as n — )• oo. 

Next, since the collection [y i— )■ kk{y — x)\ ^{t){y) '■ x E ^{t — £o)} is Glivenko- 
Cantelli by Lemma [5]2l we conclude that 



sup 



^0, 



with probability one as « — oo. This concludes the proof of the first statement. 

The second statement may be proved by developing similar arguments, with k}i 
replaced by D^k^, and by noting that the collection of functions \y ^ Djk^iy — 
x)\ : X G ^{t — Co) } is also Glivenko-Cantelli by Lemma [?!2l □ 



Lemma 5.5. Asn-^°°, almost surely, 
Kh{%{x)) ^ 

Kn,h{(pn{x)) 



1. sup 



—7- 0, and 



2. sup 



Kh{(pnix)) 
_KnM{(Pn{x)) 



^0. 



Proof. First of all, is uniformly continuous on ^{t — £o) since is continuous 
and since ^{t — Cq) is compact. Moreover, % converges uniformly to the identity 
map of ^{t) by Lemma IATI Hence 

sup \Kh {(Pn{x)) - Kh{x) I as n -> oo, 

and since Kn,h converges uniformly to Kj^ with probability one as « — )• co by 
Lemma [54l this proves 1. 



We have 

Kh{(Pn{x)) 



K„^h{(pn{x)) 



X 



K„,h{%{x)) D^(pn{x) 

Kn,h{(Pn (x) ) D^Kh ( % (x) )-Kh{ (Pn (x) ) D^Kn^h 
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Since D^(pn(x) converges to the identity matrix 1^ uniformly over x E ^{t) by 
Lemma IaTTI \\Dx^„{x) \\ is bounded uniformly over n and x E by some pos- 
itive constant Cq,. Furthermore the map x t-^ Kn^hi^) is bounded from below over 
.if (f) by some positive constant fc„„„ independent of x because i) 'ii^ixe.5f{t-Eo) ^h{x) > 
by Lemma [53l and ii) sup^g^^^.g^^-) —Kj^{x) | — by Lemma [54l Hence 



Kh{(pn{x)) 

K„j,{%{x))_ 



< 



Ca 



K„Jy)DMy)-Kh{y)D,K,M 



where we have set y = (Pn{x) which belongs to =Sf — £„) C =§f — £o). At last, 
Lemma 15.41 gives 



sup 

ye^(r-eo) 



Kn,h{y)DMy)-Kh{y)D,K„j,{y) 
as n oo which proves 2. 



— almost surely, 



□ 



We are now almost ready to prove the uniform convergence of empirical operators. 
The following lemma is a consequence of Lemma [S!2l 

Lemma 5.6. Let r : ^{t — Cq) x M"^ — )• M Z7e a continuously dijferentiable function 
with compact support such that ( i) r is uniformly bounded on ^{t — £o) x W^, i.e., 
Il^lloo < °°, and (ii) the differential Dxr with respect to x is uniformly bounded on 

^{t-eo)xW^,i.e., ||D,r||„o:=sup|||D;,r(x,y)|| : {x,y) E ^{t - Eq) xR''^ 
Define the linear operators Rn and R on W[^{t)) respectively by 



Rng{x) 

Rg{x) 



i%{x),y)g{cp,\y))¥„{dy). 
r{x,y)g{y)^'{dy). 



Then, as n ^ o°. 



sup 



1 — _/?^||^ : \\g\\w < l| — ^0 almost surely. 
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Proof. Set 



Sngix) 



Tngix) 



Ung{x) :-- 



^ ^ f^r{(p„ix),Xi)g{(p-'{Xi))l^^^^,)iXi), 



and consider the inequality 

\Rng{x) -Rg{x) \ < \Rng{x) -Sng{x)\ + \Sng{x) - Tng{x)\ 

+ \Tngix) - Ung{x) I + \Ungix) -Rg{x) \ 

for all X G ^{t) and all ^ G (=^(0) • 

The first term in (|5.3I) is bounded uniformly by 

n 1 



(5.3) 



\Rng{x) - Sngix) < 



kll-ll<?llc 



j{n) ^{^{t)) 

and since j{n)/n tends to il{J^{t)) almost surely as « — oo, we conclude that 

sup|||i?„g-5„g|L : < l| ^0 a.s. asn^oo. (5.4) 

For the second term in (15.31) . we have 



\Sng{x)-Tng{x) \ < 



1 ^ 



where g„ is the function defined on the whole space M.^ by 

gn{x)= g(<P„"^W)l^,(f)(x)-g(;c)l^(j)(x) . 

Consider the partition of M"^ given by R"^ = 5i „ U 52,« U B^^n U £4 „, where 



(5.5) 
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The sum over / in (15.51 ) may be split into four parts as 



1 ^ 



Sn O^i) = h {x, g) + h g) + ^3 g) + h{x, g) 



(5.6) 



j=i 



where 



1 A 



h{x,g):=-Y,gn{X,)\{X,eBk.n}. 
First, U^n{x-,g) = since gn is identically on B:x,n- Second, 



1 ^ 



h{x,g)+h{x,g) < \\g\\^-Y,'^^{t)AX,{t)i^i) 



(5.7) 



Applying Lemma [STTI together with the almost sure convergence of 1q^^ to 1, we 
obtain that 

1 " 

- E l^(f)4^„(0 i^j) aln^ost surely. (5.8) 



Third, 



< sup g{(p„ \x)) -g{x) 

< ||£>xg||oc sup \\%^{x)-x\ 

x<EJd'(t) 

< \\Dxg\\oo sup \\x-(p„{x)\\ 
-^0 



(5.9) 



as n oo by LemmalAU Thus, combining (l531) . (ISB . (15771) . (15^ and iSM leads 
to 

sup|||S„^-r„g|L : II^IIm/ < l| ^0 a.s.asn^oo. (5.10) 

For the third term in (|5.3I) . using the inequality 

\r{(pn{x),Xi) -r{x,Xi)\<\\D^r\\oo sup || (p„(jc) -jc|| 



we deduce that 



\Tng{x)-Ung{x) \ < 



—r^7-^\\gm\D^r\\^ sup \\(Pn{x)-x\\ 
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and so 



sup 



|||r„g-i7„g|L : ||g|k < 1} ^0 a.s. as n ^ 00, 



(5.11) 



by Lemma IaTTI 



At last, for the fourth term in (15.31) . since the function r satisfies the conditions of 
the second statement in Lemma [5]2l we conclude by Lemma [5]2] that 

sup|||t/„g-7?g|L : llgllw' < l| ^0 a.s. asn-^00. (5.12) 

Finally, reporting (|54l ). (15.101 ) and (15.111 ) in (1531 ) yields the desired result. □ 



5.2 Proof of Theorem 3.1 



We will prove that, as n — > co, almost surely. 



sup 



Qn,hg - Qhg 



\g\\w<l\^0 



and 



sup 



D,[QnMg\-D,[QhS\ 



: ||g||iy< U^O 



To this aim, we introduce the operator 2„ /, acting on W{^{t)) as 



Qn,hg{^) = / qh{%{x),y)g{cp-\y))¥\^{dy). 



(5.13) 



(5.14) 



Proof of dm Forallge W(^(0),wehave 

\\Qn.hg-Qhg\\^< \\QnMg-QnJig\\^+\\Qn.hg-Qhg\\^- (5.15) 



First, by Lemma [53l the function r = satisfies the condition in Lemma [S!6l so 
that 

sup{||2„,/,g-2/,g||oo : ||g||M/<l}^0 (5.16) 



with probability one as « — 0°. 
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Next, since ||^/,||oo < 0° by Lemma [531 there exists a finite constant C/, such that, 
\\Qn,hgU < Ch for all n and all g with \\g\\w < 1. (5.17) 
By definition of q„^h, for all x^y in the level set ^{t), we have 



qn,h{x,y) = ——-qh{x,y). 



So 



Qn,hg{x) - Qn,hg{x) 



Kn{(pn{x)) ^ 
Kn,h{(pn{x)) 



QnJig{x) 



< Ch sup 

xe.^{t) 



Kn,h{%{x)) 



(5.18) 



where Ch is as in (15.171) . Applying Lemma [531 yields 

^^p{\\Qn,hg-Qn,hg\\^: klk<l}^0 (5.19) 
with probability one as n — 00. Reporting (15.16! ) and (15.19! ) in (!5.15! ) proves (!5.13!) . 



Proof of ([SH We have 



Qn,hg 



< 



Qhg 

Qn.hg 



Qhg 



+ 



Qn.hg 



Qhg 



. (5.20) 



The second term in (!5.20! ) is bounded by 

















Qn,hg 




Qhg 




< \\D^(Pn\\J\Rng~Rg\\ 

00 



where 



Rng{x):=l {D,qh){%{x),y)g{(p,'{y))Kidy) and 
Rg{x):= I {D,qh){(Pn{x),y)g{%\y))l^'{dy). 
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By lemma IATTI x ^ D^(pn{x) converges to the identity matrix of M.'^, uniformly 
in X over ^{t). So ||Dx9;7(x)|| is bounded by some finite constant C(p uniformly 
over n and x E -Sf (?) and 

















Qn,hg 




Qhg 




<C^\\Rng-Rg\\ 

OO 



By Lemma l53l the map r : {x,y) t-^ D^-^/,(jc,y) satisfies the conditions in Lemma[5]6l 
Thus, \\Rng — Rg\\oo converges to almost surely, uniformly over g in the unit ball 
of W{Jff{t)), and we deduce that 



sup • 



Qn.hg 



Qhg 



\w 



<1^^0 a.s. asn^oo. (5.21) 



For the first term in (15.201) . observe first that there exists a constant such that, 
for all n and all g in the unit ball of W(^{t)), 



\Rn,hg\\oo ^ C'h^ foJ" all « and all g with ||g||M' < 1, 



(5.22) 



by Lemma 1531 

On the one hand, we have 



Dx[qn,h{(Pnix):y)] = ^^/'l^ (x) (D^qh) {% (x) , y) 



Kn,h{%{x)) 



+ Dx 



Kn,h{%{x)) 



qh{(Pn{x):y)- 



Hence, 



Qn,hg{x) 



Kh{(Pn{x)) 
Kn,h{(Pn{x)) 



D^(p„{x)R„g{x)+Dx 



Kh{(Pn{x)) 
Kn,h{(Pn{x)) 



QnJig{x)- 



On the Other hand, ^mcQ Dj,[qh{(i>n{x),y)] = Dj,%{x)(Dxqh){(pn{x),y), 



Qn,hg{x) =Djc(pn{x)Rng{x). 
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Thus, 



Qn,hg{x) -D^ Qhg{x) 



Kh{(pn{x)) 



Qn,hg{x) 



Using the inequalities (15.171) and (15.221) . we obtain 



Qn,hg 



Qhg 



< Ch sup 



+ C'if(p sup 



Kh{(Pn{x)) 
Kn,h{(pn{x)) 

Kh{(Pn{x)) ^ 
Kn,h{(pn{x)) 



and by applying Lemma [531 we deduce that 



sup 



QnMg 



Qhg 



\w 



< 1 > — 7- a.s. as n — )■ 



Reporting (|53T]) and (151231) in (15:201) proves (15TT41) . 



(5.23) 
□ 



6 Proof of Corollary 1X2 

Let us start with the following proposition, which relates the spectrum of the func- 
tional operator 2„ /, with the one of the matrix Q„ 

Proposition 6.1. On we have Ttn'PnQn.h = Qn,h^n'^n cmd the spectrum of the 
functional operator 2„ /, is o{Qn,h) = {0} U ct(Q„,/i)- 

Proof. Recall that the evaluation map n„ defined in (12.31) is such that Q„,/j7r„ = 

^,iQn,h, and that, on £}„, Qnjr = ^nQn.h^n^- Moreover, since and Q„j, are 
conjugate, their spectra are equal. Thus, there remains to show that o{Q„j^) — 
{0}UC7(Q„,^). 
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Remark that 2„ /, is a finite rank operator, and that its range is spanned by the 
maps X ^ qn^h{x,Xj), for j e J{n). Thus its spectrum is composed of and its 
eigenvalues. By the relation Q„./,7r„ = 7r„2„ it immediately follows that if g is 
an eigenfunction of with eigenvalue A, then V = Tt„{g) is an eigenvector of 
Q^n.h with eigenvalue A. Conversely, if {Vj}j is an eigenvector of Q„,/j, then with 
some easy algebra, it may be verified that the function g defined by 

is an eigenfunction of 2„ /, with the same eigenvalue. □ 

The spectrum of may be decomposed as o{Qh) = Oi{Qk) U Oi^Qh), where 
<7i = { 1 } and where 02{Qh) = o{Qh)\{l}. Since 1 is an isolated eigenvalue, 
there exists r\Q in the open interval (0; 1) such that o{Qh) fl {z G C : |z — 1 1 < T]o} 
is reduced to the singleton {1}. Moreover, 1 is an eigenvalue of Qh of multiplic- 
ity I, by proposition IC.2I Hence by Theorem IB.1[ W(^^{t)) decomposes into 
W{^{t)) =Mi®M2 where dim(Mi) = I. 

Split the spectrum of Q„_h as a (2,,,/,) = Oi U 02{Qn,h), where 
c^i {Qn,h) =o{Qn,h) n {z e C : |z - 1| < r7o}. 

By Theorem IB.1[ this decomposition of the spectrum of 2,, /, yields a decompo- 
sition of W[^{t)) as W[^{t)) = M„j ©M„,2, where M„,i and M„^2 are stable 
subspaces under Q„,h. Statements 4 and 6 of Theorem IB.2[ together with Propo- 
sition |6T1 gives the following convergences. 

Proposition 6.2. The first £ eigenvalues A„^2, • • -j^n/ o/Q„ /, converge to 1 
almost surely asn^oo and there exists 7]o > ^^ch that, for all j > i, belongs 
to {z : |z — 1 1 > r/o} /or n large enough, with probability one. 

In addition to the convergence of the eigenvalues of Q,,,/,, the convergence of 
eigenspaces also holds. More precisely, let 77 be the projector on Mi along M2 
and 77„ the projector on M„ j along M„ 2- Statements 2, 3, 5 and 6 of Theorem IB. 2 1 
leads to 

Proposition 6.3. 77„ converges to TI in operator norm almost surely and the di- 
mension ofMfi i is I for all large enough n. 
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Denote by E^^i the subspace of W^"^ spanned by the eigenvectors of Q„ /, cor- 
responding to the eigenvalues . . . X^/. If n is large enough, we have the 
following isomorphisms of vector spaces: 

n„ : Ml ^ Mn,i and n„4>n : M„,i ^ (6.1) 

where, strictly speaking, the isomorphisms are defined by the restriction of /!„ 
and Tln'Pn to Ml and M„^i, respectively. 

The functions gn^k n„l<ffi^, k= 1 are in A/„ i and converges to l<g'^ in W- 
norm. Then, the vectors = ^n{gn,k °%^) in En.i and, as n — )■ oo, 

= n„(l^J o ^-'(X,) ^ = {; l^^^- (6.2) 

Since Vn,i, Vn,l form a basis of £'„^i, there exists a matrix of dimension £ x £ 
such that 

I 

i=l 

Hence the component of 'd-„ i^, for all j E J{n), may be expressed as 

e 

i=\ 

Since pn{Xj) is the vector of with components {Vn.ij}i, the vector i^n,.,; = 
{'&n,k,j}k of is related to pn{Xj) by the linear transformation i.e.. 

The convergence of j to ^^^(j) then follows from (|6.2I) and Corollary [XI] is 
proved. 
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A Geometry of level sets 



The proof o f the follow ing result is adapted from Theorem 3.1 in iMilnoii 11196311 p. 12 and Theo- 
rem5.2.1 in ljos] lll995ll p. 176. 



Lemma A.l. Let / : R — > R fee a function of class . Let f g R and suppose that there exists 
eg > such that ' ( [f — £o; f + Eq] ) is non empty, compact and contains no critical point of f. Let 
{e„}„ be a sequence of positive numbers such that £„ < Eofor all n, and £„ — >■ as n °°. Then 
there exists a sequence of diffeomorphisms (p„ : J^{t) I£(t — £„) carrying J^{t) to J^it — £„) 
such that: 

L sup \\<Pn{x) ~ x\\ ^ Q and 
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2. sup \\D,(p„{x)-l4^0, 

as n — >■ oa^ where D^cpn denotes the differential of (p„ and where 1^ is the identity matrix on W^. 

Proof. Recall first that a one-parameter group of diffeomorphisms {<Pu}i,eR of Mf^ gives rise to a 
vector field V defined by 

y., = iim^i^iiM^, xeR", 

for all smooth function g : M'' M. Conversely, a smooth vector field which vanishes outside of 
a compact set ge nerates a unique one-param eter group o f diffeomorphisms of W'; see Lemma 2.4 
in 



Milnori II 196311 p. 10 and Theorem 1.6.2 in|Josd 11199511 p. 42 



Denote the set {x eW' : a< f{x) < b} by , for a < b. Let Tj : R'' ^ R be the non-negative 
differentiable function with compact support defined by 

■i/||Z),/(x)||2 if-^-ei?,U, 
77(x) = <( (r + eo-/(x))/||D,/Wl|2 ifxG^/+^«, 
. otherwise. 

Then the vector field V defined by Vx ~ rj {x)Dxf{x) has compact support ^1^^, so that V gener- 
ates a one-parameter group of diffeomorphisms 

9„ : R'' ^ R'', M e R. 

We have 

Du[f{(Pu{x))]={V,Dxf)^^,(x)>Q, 
since Tj is non-negative. Furthermore, 



(v,Z)./).p„(.v) = i, if(p„(x)eif/_ 



Consequently the map u ^-> f(^(p„{x)^ has constant derivative 1 as long as ^„(a) lies in -5f/_g|j. This 
proves the existence of the diffeomorphism (p„ := (p^g^^ which carries J2f{t) to (r — e„). 



Note that the map m G M M> (p,i{x) is the integral curve of V with initial condition x. Without loss 

+£o 

/■O 



of generality, suppose that e„ < 1 . For all x in .if/jlg^ , we have 



f 

\\(Pn(x)-x\\ < / \\Du{(Pu(x))\\du < e„/j3(e„) < e„/j3(eo) 



where we have set 

/3(e):=inf{||D,/(x)|| :x6^/+/}>0. 
This proves the statement 1, since (pn{x) — x is identically on ^{t + Eq). 
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For the statement 2, observe that (pu{x) satisfies the relation 

rii rii 

(Puix)-x = Dy{(pyix))dv = V{(pv{x)))dv. 
Differentiating with respect to x yields 

f" 

Dx(Pu[x)-Id= / D^<p,,{x)oD^^{<Py{x))dv. 



Since / is of class the two terms inside the integral are uniformly bounded over ^/+^°,so that 
there exists a constant C > such that 

||£>x<P«-/|U<Ce,„ 

forallxin Since ||Z)v9n ^ ^ll.v is identically zero on ^(f + Eq), this proves the statement 2. 

□ 



B Continuity of an isolated finite set of eigenvalues 

In brief, the spectrum CT(r) of a bounded linear operator T on a Banach space is upper semi- 
continuous in T, but not lower semi-continuous; see Katol 1 1995 1IVS3.1 and IV§3.2. However, an 



isolated finite set of eigenvalues of T is continuous in T, as stated in Theorem lB.2l below. 

Let r be a bounded operator on the C-Banach space E with spectrum o{T). Let a\ (T) be a finite 
set of eigenvalues of T. Set a2{T)^ a{T)\ai {T) and suppose that di {T) is separated from 02{T) 
by a rectifiable, simple, and closed curve F . Assume that a neighborho od of a\ {T) is enclosed in 
the interior of F. Then we have the following theorem; see iKatd il995ll . IIL§6.4 and III.§6.5. 



Theorem B.l (Separation of the spectrum). The Banach space E decomposes into a pair of sup- 
plementary subspaces as E = M\ (BM2 such that T maps Mj into Mj (j = \,2) and the spectrum 
of the operator induced by T on Mj is <Jj{T) (j = 1,2). If additionally the total multiplicity m of 
<J\ (r) is finite, then dim(Mi) = m. 

Moreover, the following theorem states that a finite system of eigenvalues of T , as well as the 



decomposition of E of Theorem lB.il depends continuously of T, see iKatd 11199511 . IV.§3.5. Let 
{r„}„ be a sequence of operators which converges to T in norm. Denote by Oi {T„) the part of the 
spectrum of T„ enclosed in the interior of the closed curve F, and by G2{T„) the remainder of the 
spectrum of T„. 

Theorem B.2 (Continuous approximation of the spectral decomposition). There exists a finite 
integer no such that the following holds true. 

1. Both (7i (r„) and <J2{T„) are nonempty for all n > no provided this is true for T. 

2. For each n >0, the Banach space E decomposes into two subspaces as E = M„ i ®M„ 2 in the 
manner ofTheorem \B.l\ i.e. T„ maps j into itself and the spectrum ofT„ onM^j is GjlT^). 

3. For all n > «o, M,j j is isomorphic to Mj. 
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4. If Oi (T) is a singleton {A}, then every sequence {A„}„ with X„ £(Ji (T„) for all n > no converges 
to A. 

5. If n is the projector on Mi along M2 and iT„ the projector on M„ 1 along M„2, then n„ 
converges in norm to 77. 

6. If the total multiplicity m of 0\{T) is finite, then, for all n > no, the total multiplicity of (7i (T„ ) 
is also m and dim{M„ i) = m. 



C Markov chains and limit operator 

For the reader not familiar with Markov chains on a general state space, we begin by summarizing 
the relevant part of the theory. 



C.l Background materials on Markov chains 



Let {^,},>o be a Markov chain with state space C M'' and transition kernel q{x,dy). We write 
Px for the probability measure when the initial state is x and for the expectation with respect to 
Px- The Markov chain is called (strongly) Feller if the map 

x&y^ Qg{x) j^q{x,dy)g{y) = IE./(^i) 



is continuous for every bounded, measurable function g on .y; see iMevn and Tweedie 1 1993 1. p. 
132. This condition ensures that the chain behaves nicely with the topology of the state space y. 
The notion of irreducibility expresses the idea that, from an arbitrary initial point, each subset of 
the state space may be reached by the Markov chain with a positive probabihty. A Feller chain is 
said open set irreducible if, for every points x,y in y, and every 77 > 0, 

Y^q"ix,y + r^B)>0, 

n>l 



where q"{x,dy) stands for the «-step transition kernel; see iMevn and Tweedid 111 99311 . p. 135. 
Even if open set irreducible, a Markov chain may exhibit a periodic behavior, i.e., there may exist 
a partition y = y^, U U . . . U y^ of the state space such that, for every initial state x G y^. 



= 1. 



Such a behavior does not occur if the Feller chain is topologically aperiodic, i.e., if for each 
initial state x, each r; > 0, there exists «o such that q"{x,x + rjB) > for every n > «o; see 
Mevn and Tweedid lll993ll . p. 479. 



Next we come to ergodic properties of the Markov chain. A Borel set A of ^ is called Harris 
recurrent if the chain visits A infinitely often with probability 1 when started at any point x of A, 
i.e., 
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for all X G A. The chain is then said to be Harris recurrent if ever y Borel set A with positive 
Lebesgue measure is Harris recurrent; see iMevn and Tweedid 11 199311 . p. 204. At least two types 
of behavior, called evanescence and non-evanescence, may occur The event — > oo] denotes the 
fact that the sample path visits each compact set only finitely many often, and the Markov chain is 
called non-evanescent if P^{£,„ -^°°) ~0 for each initial st ate x g S^. Specifically , a Feller chain is 
Harris recurrent if and only if it is non-evanescent; see iMevn and Tweedid 1199311 . Theorem 9.2.2, 
p. 212. 



The ergodic properties exposed above describe the long time behavior of the chain. A measure v 
on the state space is said invariant if 



v(A) 



q{x,A)v{dx) 



for every Borel set A in If the chain is Feller, open set irreducible, topologically aperi- 
odic and Harris recurrent, it admits a unique (up to constant multiples) invariant measure v; see 
Meyn and Tweedid lll993ll . Theorem 10.0.1 p. 235. In this case, either v{S^) < °° and the chain is 
called positive, or v( J^) = oo and the chain is called null. The following important result provides 
one with the limit of the distribution of ^„ when n — > oo, whatever the initial state is. Assuming that 
the chain is Feller, open set irreducible, topologically aperiodic and positive Harris recurrent, the 
sequence of distribution {q"{x,dy)}„>i conver ges in total variation to v(d y), the unique invariant 
probability distribution; see Theorem 13.3.1 of iMevn and Tweedie lll993ll . p. 326. That is to say, 
for every x in S^, 



sup 



i{y)q"{x,dy)-l^g{y)vidy) 



as n — > oo 



where the supremum is taken over all continuous functions g from toM. with ||^||oo < 1. 



C.2 Limit properties of Qh 

With the definitions and results from the previous paragraph, we may now study the properties of 
the limit clustering induced by the operator g/j- The transition kernel qi,{x,dy) := qij{x,y)ii' (dy) 
defines a Markov chain with state space .if (f). Recall that .if (f) has £ connected components 
'^^i , . . . ,'^^(' and that under Assumption 3, /z is strictly lower than d„u„, the minimal distance between 
the connected components. 

Proposition C.l. 1. The chain is Feller and topologically aperiodic. 

2. When started at a point x in some connected component of the state space, the chain evolves 
within this connected component only. 

3. When the state space is reduced to some connected component of ^{t), the chain is open set 
irreducible and positive Harris recurrent. 

Proof. 1 . Since the similarity function is continuous, with compact support hB, the map 

X ^ Qhg{x) = / qh{x,dy)g{y) 
Jy(t) 
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is continuous for every bounded, measurable function g. Moreover, kf, is bounded from below on 
{h/2)B by Assumption 2. Thus, for each x G ^{t), « > 1 and 77 > 0, ql{x,x + riB) > 0. Hence, 
the chain is Feller and topologically aperiodic. 

2. Without loss of generaUty, assume that x G "^^i . Let y be a point of ^{t) which does not belong 
to "^1. Then ||y — x|| > d,nm > h so that qf,(x,y) = 0. Whence, 

Px{£,i e qh{.x,'^l) = / qh{x,y)ll'{dy) = / qh{x,y)l^' (dy) = 1. 



3. Assume that the state space is reduced to "^i. Fix x,y G '^i and 77 > 0. Since '^i is connected, 
there exists a finite sequence jcqj xi, ...xn of points in 'rfi such thatjco xn =y, and || < 

I1/2 for each /. Therefore 

q^ ix,y + riB) > P,{^i G x; + rjB for all i<N)>0 

which proves that the chain is topologically aperiodic. 

Since "ifi is compact, the chain is non-evanescent, and so it is Harris recurrent. Recall that k{x) = 
k{~x) from Assumption 2. Therefore ki,{y — x) ~ ki,{x~y) which yields 

Kf, {x)qh {x, dy)fl' (dx) = {y)qh iy, dx)^' (dy) . 

By integrating the previous relation with respect to x over '^^i, one may verify that Ki^{x)ii' (dx) is 
an invariant measure. At last /.^^ Kf,{x)n' (dx) < °o, which proves that the chain is positive. □ 

Proposition C.2. Ifg is continuous and Qhg = g, then g is constant on the connected components 



Proof. We will prove that g is constant over ^^\. Proposition IC. 1 1 provides one with a unique 
invariant measure Vi [dy) when the state space is reduced to . Fix x in . Since g = Qhg, g = 
Q]^g for every « > 1. Moreover by Proposition lC.il the chain is open set irreducible, topologically 
aperiodic, and positive Harris recurrent on "^^i. Thus, q'l^{x,dy) converges in total variation norm 
to V\{dy). Specifically, 

QhS{x) / g{y)Vi (dy) as n 00. 

Hence, for every x'm^i, 

g{x)^ / g{y)vi{dy), 

and since the last integral does not depend on x, it follows that g is a constant function on '^^i . □ 
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